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(57) Abstract 

The present invention relates to a method and device at stereo acoustic echo cancellation. Acoustic echo cancellation in stereo is 
considerably more difficult than echo cancellation in mono, due to strong correlation between the stereo channels. This invention is based 
on utilization of a perceptual audio coder to reduce the correlation between the stereo channels, without introducing audible distorsion. This 
will result in that die stereo canceller converges towards the correct echo paths and therefore gives a more stable echo cancellation which is 
not dependent on the transmission room (far-end). The core of the invention is that one can reduce the correlation in excess of that which 
is made by the audio coder, by modifying its decoder. Extra, uncorrelated (between the channels) noise is added (in the decoder) to such 
an extent that it is not audible, by information provided from the coder being used in combination with an estimated perceptual masking 
threshold. The solution consequently is flexible and does not require that the coding standard which is used is changed in any way. Only 
a small number of operations need to be included in the decoder. 
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TITLE OF THE INVENTION: METHOD AND DEVICE AT STEREO 

ACCOUSTIC ECHO CANCELLATION 

5 TECHNICAL FIELD 

The present invention relates to echo cancellation in 
combination with signal coding. 

TECHNICAL PROBLEM 

10 Acoustic echo cancellation in stereo channels is a more 

difficult problem than corresponding mono case. This is due 
to the fact that each channel carries similar speech 
signals, which results in problems for the adaptive 
algorithm that is used. The fields of application for 

15 stereo cancellation is/is expected to be/ high-quality 

video conference systems and the field of tele-games. These 
fields, however, have different demands on quality, 
bandwidth etc. 

20 In the mono case, NLMS (Normalized Least Mean Square 
Algorithm) is used exclusively, due to its robustness 
against noise and signal variations (none-stat ionariness) . 
The disadvantage of this algorithm is that it has a 
convergence which is dependent on the spectral 

25 characteristics of incoming signals (far end) . A strong (in 
time) auto correlated signal gives slow convergence and 
vice versa. In the stereo case, the speech signal is 
correlated in time, but also between respective channels 
which slows down the convergence speed for NLMS to such an 

30 extent that it will be useless. Echo cancellation then must 
be performed with some other kind of algorithm than NLMS . 
Essentially there are two types of algorithms to chose 
from, sub-band algorithms, or full length RLS (Recursive 
Least Square) . These two of course have different 

35 advantages and disadvantages at implementation. The channel 
correlation also results in that there is no theoretical 
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estimate of echo paths which the echo canceller converges 
towards, but a lot of solutions which all are dependent on 
the transmitter room (far end), Figure 1. This results in 
an unstable echo cancellation, and the echo canceller 
5 diverges with irregular intervals. To make the echo 

canceller converge in a stable way towards the correct echo 
paths, the stereo signals have to be modifed before they 
reach the echo canceller as reference signals. 



io Stereo cancellation includes the following complex of 
problems : 

* The echo paths wl(n), w2 (n) , Figure 1, in the near end, 
N, which shall be estimated by AEC is not uniquely 

15 indentif iable from measure data. 

* The echo cancellation of the canceller is dependent on 
the variability of the channels, gl (n) , g2 (n) in the far 
end, F. 

20 

Assume that the signals from the microphone of the far end 
is given by, Figure 3, 

x A {n)*= g 1 (n)*s(n) # i = l,2 

25 

where s{n) is the source signal and gi(n), i=l,2 is the 
echo paths of the far end with the length M f "*" describes 
convolution . 



30 The residual echo/echoes after the echo canceller is 



e(n) = y(n)-h^ x I L -h[ L x 2 L 
y( n ) = ^nX I N (n) + h[ n x 2 N (n) 

h i.n =[hi.o.,h t N-l] 



35 



X 



i.N(n) 
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h iN , i=l,2 is the real response of the length N from the 
near end, and h iL =l,2 is the estimated response of the. 
lenght L. 

5 Minimization of the weighted least square criteria 

J(n) = Z r"ie(n)|\0<X£l 
ii 

results in the solution of the linear equation system 

10 

^, x (n)[h,,.-l 

where r xx (n)is the estimated cross correlation vector, and 
15 R xx( n ' ^ s the correlation matrix, 

R„(n) = Z [x..l(')x;. l (1) x 2L (1)x^(1)] 

l°l 

h,.(0xJ, L (l) x u (l)xJ x (I)J 

20 The problem at stereophonic echo cancellation is the 

conditional number for this matrix. Further has been shown 

L>M => R^n) is singular n 
L<M => R^fn) is poorly conditioned 
25 L>N => misalignment 8 (n) — >0 , n->oo 

L<N => misalignment e(n)->0-^Vn 

where the misalignment is 

30 e(n) = |h-h|| 2 /|| h fand h = [h* L h[ L ] T h = [h[ L hJ iL f 

A poorly conditioned R xx {n) increases the misalignment. 
Consequently there is a contradiction in the solution if 
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L<<M is better conditioned, on the other hand the 
misalignment is reduced if L>N, but practically is L<M=N. 
The solution of this misalignment is to reduce the 
correlation between the stereo channels. 

The eigenvalues of the correlation matrix can be limited in 
downward direction by [l-|y(f)| 2 ]. where y(f) is the coherence 
between the stereo channels. Misalignment therefore can be 
measured with the coherence function, which then serves as 
a measure of achieved decorrelation . 

The present inventions therefore are intended to solve the 
above mentioned problems. 

PRIOR ART 

Two important applications for stereo acoustic echo 
cancellation is high-quality video conference and tele 
games. In the future, also desk- top based conference 
systems will have a need for stereo acoustic echo 
cancellers (AEC) . These systems have different demands on 
bandwidth, bit rate etc. 

Stereo acoustic echo cancelling, however, has turned out to 
be more complicated than the mono channel case. This is due 
to that, in the two channel case, the signals are linearly 
depending, which results in convergence problems for the 
echo canceller. Because of the linear dependence between 
the channels, there are theoretically no unique solutions 
for the echo canceller to identify. Furthermore, all not 
unique solutions are depending on the echo paths at the far 
end of the connection, F, (far end). In real situations, 
however, the solution is not singular, but only poorly 
conditioned due to uncorrelated microphone noise and 
infinitely long impulse responses on the echo paths of the 
far end. The convergence degree of the NMLS-algorithm is to 
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a great extent depending on the number of the system 
conditions, so more sophisticated algorithms are needed at 
stereo acoustic echo cancellation. 

Beside the utilization of more sophisticated algorithms, 
problems remain with unstable estimates of the echo paths. 
In order to stabilize the solution, the correlation between 
the stereo channels must be reduced without introduction of 
distorting distorsion. Different solutions to solve this 
have been presented, but have been rejected for different 
reasons (see for instance M.M. Sondhi, D.R. Morgan and J.L. 
Gall : "Stereophonic acoustic cancellation - an overview of 
the fundamental Problem"; IEEE Signal Processing Letters, 
2 (8) : 148-151, 1955). The most promising solution at present 
is to distort the stereo channels non-linearly (for 
instance J.Benesty, R.Morgan and M.M. Sondhi: "A better 
understanding and an improved solution to the problem of 
stereophonic acoustic echo cancellation". IEE Trans. On 
Speech and Audio Processing. To appear; A short version can 
be found in Proc. of ICASSP 1997 pp 303-306) where half- 
wave rectified parts of the signal are added to the signal 
itself. This distorsion does not destroy the stereophonic 
perception, but introduces noise which for the most part is 
inaudible, but can be registered depending on the extent of 
non-linearity. 

At transmission of acoustic signals between parties in, for 
instance, a telecommunication, a certain part of the own 
sound is brought back and creates an echo. In most cases 
one wants to have this echo at least reduced to a level 
which is not disturbing. This is achieved by means of a so 
called echo canceller. The principle for these is that a 
part of the own signal is identified and subtracted from 
the received signal. It consequently is known to utilize 
echo cancellation in the mono case. At this previously 
known principles are utilized which i.a. are described in 
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the patent literature, for instance US 5668865, US 5664011, 
US 5610909. In the patent documents US 5661813, US 574545, 
US 5323459, US 5369554, US 5555310 and US 5513265, the 
problems of stereo acoustic echo cancellation are dealt 
with more specifically. 

THE SOLUTION 

The present invention relates to a method at stereo 
acoustic echo cancellation, where the echo is created on a 
connection for transmission of a stereo acoustic signal. 
The signal is coded on the transmitter side, F, and decoded 
on the receiver side, N. A perceptual audio coding is 
introduced. By perceptual coding is meant that the signal 
can consist of different frequencies which are transmitted 
at the same time, there one of these signals dominates over 
the other but gives no additional contribution to the 
received information. Furthermore, the side information of 
the coded signal is utilized. The echo after that can be 
identified and cancelled. By utilization of, for instance, 
MPEG-coding, perceptual coding which allows that the 
channel correlation between the stereo channels is reduced, 
is achieved. At frequencies over 2 kHz the perceptual 
coding is advantageous. Below 2 kHz, the side information 
can be utilized to further reduce the correlation. Each 
sub-band into which the signal is divided, indicates the 
utilization of the signal and the quantizer which is used 
at the coding. Quantizer is selected at the coding at which 
at an analysed segment of the signal is utilized. Further, 
a masking threshold is appointed which defines distorsion 
levels which cannot be heard within the segment. The 
masking threshold is selected so that a just noticeable 
distorsion is attained. Uncorrelated noise between the 
channels are added to the margin in the decoder, at which 
an improved echo cancellation can be achieved. 
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The invention further relates to a device at stereo 
acoustic echo cancellation. A sound registering equipment 
on the transmitter side, F, registers the signal which is 
coded in a coder, C, and transmitted to a decoder, D, on 
the receiver side, N. In the coder, C, a perceptual coding 
of the signal is performed. Side information in the coded 
signal is further utilized. For identification of the echo 
and the cancellation of this, a stereo acoustic echo 
cancellator, AEC, is utilized. Perceptual coding is 
performed in the coder, C, by utilization of, for instance, 
MPEG-coding for reduction of the channel correlation 
between the channels. The decoder analyses segment of the 
signal for deciding a masking threshold which defines 
inaudible distorsion levels within the segment. The coder, 
C, further selects quantizer, dq. Appointment of masking 
threshold is made in the decoder in such a way that a 
margin to just noticeable distorsion is attained. 
Uncorrected noise, between the channels, is added by the 
decoder to the signal. 

ADVANTAGES 

The invention makes possible that methods for cancellation 
of echoes, on connections over which stereo transmissions 
are made, are executed. The introduction of the invention 
is possible without addition of extra equipment, which may 
be expensive. By utilization of perceptual coders/decoders 
a possibility is given to implement the solution on the 
decoder side, without the coder having need for knowing 
this. The solution further has the advantage that a good 
conditioning is attained, without introducing distorison, 
which can interfere with the communication. 

DESCRIPTION OF FIGURES 

Figure 1 illustrates microphone and loudspeaker near-end, 
N, respective far-end, F. Within the frame with the broken 
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line is the acoustic echo canceller (AEC) , the stereo case. 
Only one of the back channels is shown. 

Figure 2 illustrates the far-end room, stereo acoustic echo 
canceller, AEC, (stereo AEC) and perceptual audio coder, 
C/D (coder/decoder) . 

Figure 3 illustrates an MPEG-1 layer III decoder. The 
following designations have been used: 

pi : PCM input 

af: Filter bank analysis 

md : MDCT 

sq: Scaling device and Quantizer 

he: Huffman coding 

mp: Multiplexer 

dm: Demultiplexer 

hd: Huffman decoding 

dd: Dequantizer and descaling device 

im: Inverse MDCT 

sfb: Synthesis Filter Bank 

po: PCM output 

dt : decide masking thresholds 

si: side information 

di : Decoding of side information 

b: MPEG layer III bit stream 

Figure 4 illustras the masking threshold. The dotted areas 
are masked by the tone. The sound pressure is indicated in 
dB. The frequency is indicated on a log scale. DN signifies 
decorrelating noise level. Q signifies quantizing noise 
level . 

PREFERRED EMBODIMENT 

In the following the invention is described on basis of the 
figures and the terms in them. Acoustic echo cancellation 
in stereo is considerably more difficult than echo 
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cancellation in mono, due to strong correlation between the 
stereo channels. 

This invention is based on utilizing a perceptual audio 
coder to reduce the correlation between the stereo channels 
without introducing audible distorsion. This will result in 
that the stereo canceller converges towards the correct 
echo paths and therefore gives a more stable echo 
cancellation which is not depending on the transmission 
room (far-end) . The core of the invention is that one can 
reduce the correlation beyond that which the audio coder 
gives, by modifying its decoder. Extra uncorrelated 
(between the channels) noise is added (in the decoder) to 
such an extent that it is not audible, by the information 
from the coder being used in combination with an estimated 
perceptual masking threshold. 

The solution consequently is flexible and does not require 
that the used coding standard is changed in any way. Only a 
small number of operations need to be included in the 
decoder. 

The invention is based on that the distorsion is introduced 
as noise addition to the speech signal without interfering 
with this. Further, the qualities of the speech/audio coder 
(for instance MPEG-coder) which is on the transmission 
channel, C/D, between the near-end and the far-end, is 
utilized. For the purpose, a perceptual audio coder, which 
introduces the effect that the channel correlation is 
reduced between the stereo channels, is utilized. The 
coherence will go down below 0,95 for frequencies over 2 
kHz with the MPEG Layer III coder. A coherence below 0,95 
is aimed at, to condition the solution which the echo 
canceller shall find. At frequencies below 2 kHz the 
coherence still is high, so further modification of the 
signal is necessary in the range below 2 kHz. For this 
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purpose, side information which is in the coded signal is 
utilized, without disturbing distorsion being introduced. 
Within each sub-band of the signal which is decoded, the 
utilization of the signal is indicated, and which quantizer 

5 that the coder has utilized. The coder selects quantizer on 
basis of the amount of energy that is in the analysed 
segment of speech {or audio signal) , and the so called 
masking threshold which indicates not audible distorsion 
levels in the segment. Selection is made with knowledge of 

10 that there often is a margin to the just noticeable 
distorsion level. The margin left is utilized by 
uncorrelated noise between the channels being added to the 
signal. By this measure, a coherence reduction is attained 
to find stable unique estimates of the echo paths in the 

15 near-end, N. 

The most advanced part in the PMEG-1 standard is layer III, 
which typically compresses stereo sound up to 12 times 
without significant loss of quality of the sound. It is 
20 included in the standards such as H.310 audiovisual, 

broadband communications system, and H.323 visual telephone 
systems and equipmnnt for local networks. Layer III coders 
usually also are utilized as high quality coders in World 
Wide Web (WWW) . 

25 

The high compression is possible by removing parts in the 
signal which are not audible, or are lacking information 
for the ear. At simultaneous masking, larger frequency 
components will screen off the smaller ones in nearby 
30 frequency bands, whereas at temporary masking, i.e. 

components just before or after (in the time domain) , a big 
sound component is screened off. The audio coder estimates 
the global masking threshold, the just noticeable 
distorsion, as a function of frequency and time segment. 



35 
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The sound decoder operates parallell with the global 
algorithm for estimation of the masking. The signal of the 
sound source in divided into 32 critically sampled down 
bandpass signals in a filter bank. In layer III the 
frequency selection is increased by each band pass signal 
being worked upon with a discrete cosinus transform (MDCT) . 
The lenght of the MDCT-window is signal dependent and is 
either 6 or 18, where the shorter window is utilized for 
transients in the sound source. The MDCT -components are 
scaled and quantized after the decompression. The key for 
noticing coder is that sufficient number of quantizing 
levels in each sub-band exist for keping the introduced 
quantizing noise below the global masking threshold. The 
data redundancy is reduced by utilizing Huffman coding on 
the signal before it is transmitted in the channel. 

When two signals are not identical, the introduced 
quantization noise in the two channels is almost 
independent. This will result in that the correlation 
between the channels is reduced. Decoding is essentially 
performed in the same way as the coding, but just the 
reverse . 

The correlation beween the channels are reduced even more 
if independent noise is added to the channels. Each of the 
DCT-bands cannot be optimally quantized due to big 
overhead. They are instead divided into five ranges with a 
defined number of quantizing levels. Define noise to mask 
relation (QMR) as the difference between the level of the 
quantizing noise and the level which is just audible in a 
given MDCT-band. After that, noise which is not audible can 
be added to the MDCT where QMR is positive. In the 
frequency ranges where the channel correlation need to be 
reduced to fulfil 

QMR(j)>0^X: dcl = Xi ldct+ f(QMR(j)).v 
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where X f m(Jcl is the MDCT -component in band j and f (,) 
amplifies the noise component v which is added. A block 
implementing this channel decorrelation is added to the 
decoder just before the inverting of MDCT . 

The global masking information is not accessible in the 
decoder, but thanks to the high frequency resolution of 
MDCT, a global masking estimate, the calculation complexity 
of which is simplified, is produced. Independent noise 
after that is added before the inverse MDCT in the MDCT- 
components which have sufficiently high SMR. 

The invention is not restricted to the in the above 
described example of embodiment, or to the following patent 
claims, but may be subject to modifications within the 
frame of the idea of the invention. 
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PATENT CLAIMS 

1. Method at stereo acoustic echo cancellation, where the 
echo is created on a connection for transmission of a 

5 stereo acoustic signal, which signal is coded on the 

transmitter side (F) , and decoded on the receiver side 
(N), characterized in that a perceptual 
audio coding is introduced, that side information in the 
coded signal is utilized, and that the echo can be 

10 identified and cancelled. 

2. Method according to patent claim 1, 
characterized in that the perceptual coding 
is realized by, for instance, MPEG-coding, that the 

15 perceptual coding allows that the channel correlation is 

reduced between the stereo channels. 

3 . Method according to patent claim 1 and 2 , 
characterized in that the perceptual coding 

20 with advantage is utilized at frequencies exceeding 2 

kHz. 

4. Method according to patent claim 1, 
characterized in that the side information 

25 preferably is utilized at frequencies up to 2 kHz. 

5. Method according to patent claim 1 and 4, 
characterized in that, for respective sub- 
band in the signal, the utilization of the signal, and 

30 which quantizer that is used at the coding, is 

indicated. 

6. Method according to patent claim 1, 4 and 5, 
characterized in that the quantizer is 

35 selected at the coding on basis of an analysed segment 

of the signal, and that a masking threshold, indicating 
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not audible distorsion levels within the segment, is 
selected. 

7. Method according to patent claim 1, 4, 5 and 6, 

5 characterized in that the selection of the 

masking threshold is made so that a margin to a just 
noticeable distorsion is attained, and in that 
uncorrelated noise between the channels is added to the 
margin. 

10 

8. Device at stereo acoustic echo cancellation, where a 
signal is registered by a sound registering equipment on 
the transmitter side, F, and the signal is coded in a 
coder (C) and transmitted on a connection to a decoder 

15 (D) on the receiver side (N), characterized 

in that the coder (C) is arranged to perform a 
perceptual coding of the signal, that side information 
in the coded signal is utilized, and that a stereo 
acoustic echo canceller is arranged to identify the echo 

20 and reduce it. 

9. Device according to patent claim 8, 
characterized in that the perceptual coding 
is performed in the coder (C) , and that, for instance, 

25 MPEG- coding is utilized for reduction of channel 

correlation between channels. 

10. Device according to patent claim 8, 
characterized in that the stereo acoustic 

30 echo canceller (AEC) is arranged to analyse segment of 

the signal to appoint a masking threshold defining 
unaudible distorsion levels within the segment. 

11. Device according to patent claim 8 or 10, 

35 characterized in that the coder (C) is 

arranged to select the quantizer. 
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12. Device according to patent claim 8, 9, 10, or 11, 

characterized in that the stereo acoustic 
echo canceller (AEC) is arranged to select the masking 
threshold so that a margin to a just noticeable 
5 distorsion is attained, that the stereo acoustic echo 

canceller (AEC) is arranged to add an uncorrelated noise 
between the channels, to the signal. 



WO 99/22460 



PCT/SE98/01858 




Figure 2 
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(57) Abstract 

The present invention relates to a method and device at stereo acoustic echo cancellation. Acoustic echo cancellation in stereo is 
considerably more difficult than echo cancellation in mono, due to strong correlation between the stereo channels. This invention is based 
on utilization of a perceptual audio coder to reduce the correlation between the stereo channels, without introducing audible distorsion. This 
will result in that the stereo canceller converges towards the correct echo paths and therefore gives a more stable echo cancellation which is 
not dependent on the transmission room (far-end). The core of the invention is that one can reduce the correlation in excess of that which 
is made by the audio coder, by modifying its decoder. Extra, uncorrelated (between the channels) noise is added (in the decoder) to such 
an extent that it is not audible, by information provided from the coder being used in combination with an estimated perceptual masking 
threshold. The solution consequently is flexible and does not require that the coding standard which is used is changed in any way. Only 
a small number of operations need to be included in the decoder. 
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